Learning Correlations between Linguistic Indicators and Semantic Constraints: Reuse of Context-Dependent Decsriptions of Entities
نویسنده
چکیده
This paper presents the results of a s tudy on the semantic constraints imposed on lexical choice by certain contextual indicators. We show how such indicators are computed and how correlations between them and the choice of a noun phrase description of a named entity can be automatically established using supervised learning. Based on this correlation, we have developed a technique for automat ic lexical choice of descriptions of entities in text generation. We discuss the underlying relationship between the pragmatics of choosing an appropriate description that serves a specific purpose in the automatically generated text and the semantics of the description itself. We present our work in the framework of the more general concept of reuse of linguistic s tructures that are automatically extracted from large corpora. We present a formal evaluation of our approach and we conclude with some thoughts on potential applications of our method. 1 I n t r o d u c t i o n Human writers constantly make deliberate decisions about picking a particular way of expressing a certain concept. These decisions are made based on the topic of the text and the effect that the writer wants to achieve. Such contextual and pragmatic constraints are obvious to experienced writers who produce context-specific text wi thout much effort. However, in order for a computer to produce text in a similar way, either these constraints have to be added manually by an expert or the system must be able to acquire them in an automat ic way. An example related to the lexical choice of an appropriate nominal description of a person should make the above clear. Even though it seems intuitive that Bill Clinton should always be described with the NP "U. S. president" or a variation thereof, it turns out tha t many other descriptions appear in on-line news stories that characterize him in light of the topic of the article. For example, an article from 1996 on elections uses "Bill Clinton, the democratic presidential candidate", while a 1997 article on a false bomb alert in Little Rock, Ark. uses "Bill Clinton, an Arkansas native". This paper presents the results of a s tudy of the correlation between named entities (people, places, or organizations) and noun phrases used to describe them in a corpus. Intuitively, the use of a description is based on a deliberate decision on the par t of the author of a piece of text. A writer is likely to select a description that puts the enti ty in the context of the rest of the article. It is known that the distr ibution of words in a document is related to its topic (Salton and McGill, 1983). We have developed related techniques for approximating pragmatic constraints using words that appear in the immediate context of the entity. We will show that context influences the choice of a description, as do several other linguistic indicators. Each of the indicators by itself doesn ' t provide enough empirical da ta that distinguishes among all descriptions that are related to a an entity. However, a carefully selected combination of such indicators provides enough information in order pick an appropria te description with more than 80% accuracy. Section 2 describes how we can automatical ly obtain enough constraints on the usage of descriptions. In Section 3, we show how such constructions are related to language reuse. In Section 4 we describe our experimental setup and the algorithms that we have designed. Section 5 includes a description of our results.
منابع مشابه
Learning Correlations between Linguistic Indicators and Semantic Constraints: Reuse of Context-Dependent Descriptions of Entities
This paper presents the results of a study on the semantic constraints imposed on lexical choice by certain contextual indicators. We show how such indicators are computed and how correlations between them and the choice of a noun phrase description of a named entity can be automatically established using supervised learning. Based on this correlation, we have developed a technique for automati...
متن کاملNamed Entity Recognition in Persian Text using Deep Learning
Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...
متن کامل1 Global Inference for Entity and Relation Identification via a Linear Programming Formulation
Natural language decisions often involve assigning values to sets of variables, representing low level decisions and context dependent disambiguation. In most cases there are complex relationships among these variables representing dependencies that range from simple statistical correlations to those that are constrained by deeper structural, relational and semantic properties of the text. In t...
متن کاملPresenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملSemantic Features of Math Problems: Relationships to Student Learning and Engagement
The creation of crowd-sourced content in learning systems is a powerful method for adapting learning systems to the needs of a range of teachers in a range of domains, but the quality of this content can vary. This study explores linguistic differences in teacher-created problem content in ASSISTments using a combination of discovery with models and correlation mining. Specifically, we find cor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998